Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments
نویسندگان
چکیده
We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.
منابع مشابه
Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters
We consider the problem of part-of-speech tagging for informal, online conversational text. We systematically evaluate the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks; Twitter tagging is improved from 90% to 93% accuracy (m...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کاملPart-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages
The paper reports work on collecting and annotating code-mixed English-Hindi social media text (Twitter and Facebook messages), and experiments on automatic tagging of these corpora, using both a coarse-grained and a fine-grained part-ofspeech tag set. We compare the performance of a combination of language specific taggers to that of applying four machine learning algorithms to the task (Condi...
متن کاملSembler: Ensembling Crowd Sequential Labeling for Improved Quality
Many natural language processing tasks, such as named entity recognition (NER), part of speech (POS) tagging, word segmentation, and etc., can be formulated as sequential data labeling problems. Building a sound labeler requires very large number of correctly labeled training examples, which may not always be possible. On the other hand, crowdsourcing provides an inexpensive yet efficient alter...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011